Internet Archive's in-browser bookreader "theater" requires JavaScript to be enabled. It appears your browser does not have it turned on. Please see your browser settings for this feature.

Approximate word matches between two random sequences

by: Conrad J. Burden; Miriam R. Kantorovitz; Susan R. Wilson

Publication date: 2008-01-21

Collection: arxiv; additional_collections; journals

Language: English

Given two sequences over a finite alphabet $\mathcal{L}$, the $D_2$ statistic is the number of $m$-letter word matches between the two sequences. This statistic is used in bioinformatics for expressed sequence tag database searches. Here we study a generalization of the $D_2$ statistic in the context of DNA sequences, under the assumption of strand symmetric Bernoulli text. For $k